Evidence for the Interpretation and Use of Scores from an Automated Essay Scorer

نویسنده

  • Paul Nichols
چکیده

This paper examined validity evidence for the scores based on the Intelligent Essay Assessor (IEA), an automated essay-scoring engine developed by Pearson Knowledge Technologies. A study was carried out using the validity framework described by Yang, et al. (2002). This framework delineates three approaches to validation studies: examine the relationship among scores given to the same essays by different scorers, examine the relationship between essay scores and external measures, and examine the scoring processes used by the IEA. The results of this study indicated that, although relationships among scores given to the same essays by different scorers (percent agreement, Spearmen rank-order correlation, kappa statistic and Pearson correlation) indicated a stronger relationship between two human readers than between the IEA and a human reader, stronger relationships were found between the IEA and experts than between readers and experts. In addition, the results of examining the scoring processes used by the IEA showed that the IEA used processes similar to a human scorer. Furthermore, the IEA scoring processes were more similar to processes used by proficient human scorers than to processes used by non-proficient or intermediate human scorers. The results of this study provided positive evidence for the use of IEA scores as measures of writing achievement. Further research with the IEA in other assessments and grade levels will be helpful in generalizing the results of this study and further strengthening the validity of IEA for scoring writing assessments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Scoring for Creative Problem Solving Ability with Ideation-Explanation Modeling

This paper describes an automated scorer for assessing students’ Creative Problem-Solving (CPS) abilities via modeling the intra-structure of students’ essays describing their thoughts on solving particular problems. The automated scorer aims to grade students’ open-ended responses to an essay-question-type CPS ability test, instead of using typical Likert-type or multiple-choice questions that...

متن کامل

Stumping e-rater: challenging the validity of automated essay scoring

This report presents the findings of a research project funded by and carried out under the auspices of the Graduate Record Examinations Board Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or poli...

متن کامل

Framing Bias in the Interpretation of Quality Improvement Data: Evidence From an Experiment

Background A growing body of public management literature sheds light on potential shortcomings to quality improvement (QI) and performance management efforts. These challenges stem from heuristics individuals use when interpreting data. Evidence from studies of citizens suggests that individuals’ evaluation of data is influenced by the linguistic framing or context of that information an...

متن کامل

Dengue with Normal Platelet Count and no Hemoconcentration: Automated Hematogram in Cases with Underlying Thalassemia

Dear Editor, Dengue is an important arbovirus infection. This infection can result in an acute febrile illness. The important hematological abnormalities included hemoconcentration and thrombocytopenia (1). Due to the decreased platelet count, the patient might develop petechiae and hemorrhagic complication. In endemic area, the presumptive diagnosis of dengue is usually derived by the cl...

متن کامل

Task Type and Prompt Effect on Test Performance: A Focus on IELTS Academic Writing Tasks

Recent versions of international high-stakes tests like TOEFL and IELTS have made use of integrated tasks in addition to the traditional independent tasks in a claim to provide a more realistic estimation of the test takers’ language abilities. The present study aimed to investigate how test takers’ performance may differ on such tasks. As such, the test takers’ performance was compared on IELT...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004